We will analyse 10x Genomics Visium spatial transcriptomics (ST) data
from a tissue slide of high-grade serous ovarian
carcinoma (HGSOC).
Ovarian cancer remains the eighth leading cause of cancer deaths in
women worldwide and high-grade serous ovarian carcinoma (HGSOC) is the
most common and lethal histologic subtype.
Genomes of HGSOC are highly heterogeneous with most
alterations only found in a small fraction of tumours.
Also, due to a high degree of chromosomal instability,
most HGSOCs are polyclonal. As the cancer progresses and metastasises,
clonal diversity increases, which is associated with
worse prognosis and development of
chemoresistance.
Standard first-line treatment for HGSOC typically consists of debulking surgery, which involves removal of as much of the tumor as possible, followed by intravenous paclitaxel/platinum-based chemotherapy, and often subsequent maintenance therapy.
Investigating tumor heterogeneity, The Cancer Genome Atlas (TCGA)
project revealed four molecular subtypes of HGSOC based
on bulk expression measurements: mesenchymal, proliferative,
immunoreactive, and differentiated. These subtypes are associated with
differences in prognosis.
It is now generally accepted that transcriptional subtypes of HGSOC
largely reflect the degree of immune cell infiltration and the
abundance of fibroblasts, rather than inherent differences in
tumour cells.
To determine how these non-malignant cell types might influence tumour
growth and prognosis, ST data hold the potential to reconstruct
ligand-receptor interactions between stromal, immune and tumour cell
populations, resolving the tumor tissue architecture at spatial
resolution.
In this hackathon we will analyze data of tumour sample collected during interval debulking surgery from HGSOC patient with a good response to taxane- and platinum-based neoadjuvant chemotherapy (NACT) treatment.
Data were retrieved from Denisenko, E. et al., Spatial transcriptomics reveals discrete tumour microenvironments and autocrine loops within ovarian cancer subclones. Nat Commun. (2024)
Before starting, let’s set the stage to be all on the same
ground.
You have been provided with configuration files. Please select “your”
environment and the “helper” functions
source('../00_environment.R')
source("../00_helper_functions.R")
source("level_1_libraries.R")
To create the SeuratObject representing the ST data of
the sequenced slide we will rely on information stored in the
input folder. This folder contains raw counts
and the hematoxylin and eosin (HE) image of the tissue slide:
| What | Path to |
|---|---|
| raw counts | ../input/10XVisium/ |
| HE image | ../input/10XVisium/spatial/tissue_lowres_image.png |
In raw counts folder there are three files usually
provided by 10x Genomics Visium:
matrix.mtx:
Count matrix is usually stored as a sparse format for a more efficient
manipulation.
Rows: genes
Columns: barcodes associated to each spot.
features.tsv:
This file contains specifications of rows included in the count matrix
file.
For each feature, the following information are reported:
feature ID, e.g gene ID
“ENSG00000284733”feature name, e.g gene name
“OR4F29”feature type which will be one of the
following options:barcodes.tsv:
This table contains barcode sequences corresponding to column indices of
the matrix file.
Each barcode sequence contains a suffix with a dash separator followed
by a number, such as ‘-1’.
Referring to paths reported in the table above, load 10X Visium data
in a SeuratObject.
A SeuratObject is a specialized data structure developed in
the Seurat package,
enabling the study of gene expression in the context of tissue
architecture.
This SeuratObject is a container that organizes and stores
both the spot-level expression data along with the associated image of
the tissue slide.
Now, create a SeuratObject containing spot-level gene
expression counts and the image of tissue slide; in order:
Read 10X Genomics Visium count matrix using
Read10X() function
Create a SeuratObject with the count matrix with
CreateSeuratObject()
Read 10X Genomics Visium image through
Read10X_Image().
Be careful to include only spots determined to be over tissue.
Questions
How many spots do you consider in the assay?
How many genes are included in the panel?
By using the Read10X() function, we load the sparse data
matrix of gene expression.
data.dir specifies the directory containing the
file.gene.column refers to the column number in the
feature.tsv file where the gene names are stored.cell.column indicates the column number where the cell
names are specified in the barcodes.tsv file.unique.features is a logical parameter to specify
whether to make gene names unique and avoid redundant names.strip.suffix is a logical parameter (default =
FALSE) to remove the suffix ‘-1’ from all spot’s
barcodes.counts <- Read10X(data.dir = "../input/10XVisium/"
, gene.column = 2
, cell.column = 1
, unique.features = TRUE
, strip.suffix = FALSE) # KEPT to match spot in the img (the reading function does not trim the "-1")
str(counts)
## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
## ..@ i : int [1:5944351] 39 52 54 71 102 154 216 236 277 340 ...
## ..@ p : int [1:2125] 0 897 5090 7437 9411 14604 15773 17255 19894 21407 ...
## ..@ Dim : int [1:2] 33538 2124
## ..@ Dimnames:List of 2
## .. ..$ : chr [1:33538] "MIR1302-2HG" "FAM138A" "OR4F5" "AL627309.1" ...
## .. ..$ : chr [1:2124] "AAACAAGTATCTCCCA-1" "AAACAGAGCGACTCCT-1" "AAACAGTGTTCCTGGG-1" "AAACCGTTCGTCCAGG-1" ...
## ..@ x : num [1:5944351] 1 2 1 1 1 1 1 1 1 1 ...
## ..@ factors : list()
Now we create a Seurat object from raw counts data. To
do so, we need to specify in the function
CreateSeuratObject():
counts: matrix-like object with raw counts.project: the project name for the Seurat
object.assay: the name of initial assay.The resulting sp object contains:
orig.ident: the identifier of the sample.nCount_spatial: the total number of counts per
cell/spot.nFeature_spatial: the number of detected features
(genes) per cell/spot.sp <- CreateSeuratObject(counts = counts
, project = '10XVisium'
, assay = 'Visium10x'
)
DefaultAssay(sp) <- 'Visium10x' #The default assay is set to `Visium10x`
Let’s include the low-resolution tissue image in the
Seurat object.
Now we retrieve the image metadata from 10x Genomics Visium experiment
and create an img object.
Using the Read10X_Image function, load the 10X Genomics
Visium Image (reference) in
the img object to extrapolate the spatial 2D coordinates of
each spot.
image.name is the file name of the PNG image we are
loading in the assay.assay specifies the associated assay in the
Seurat object we are building.slice is the name for the current image.filter.matrix is a boolean option to retain only spots
in the count matrix that have been detected to be over the tissue
image.img <- Read10X_Image(image.dir = "../input/10XVisium/spatial/"
, image.name = "tissue_lowres_image.png"
, assay = 'Visium10x'
, slice = 'slice1'
, filter.matrix = TRUE
)
# Getting the cell (spots) names
sp_cells = Cells(sp)
# Retrieve spots associated with features in the `SeuratObject` `sp` from the image object `img`
img <- img[sp_cells]
# Add the image to the Default object `sp`
sp[["slice"]] <- img
The resulting sp object contains in
meta.data table:
orig.ident: the identifier of the samplenCount_spatial: the total number of counts per
spotnFeature_spatial: the number of detected features
(genes) per spotNow, you have to:
Measure the percentage of MT genes of each spot and
quantify the capture efficiency per spot.
The capture efficiency is defined as the ratio of number of
expressed genes (nFeature) and total counts
(nCount) per spot.
Add these two annotations in the meta.data table included in your
SeuratObject.
Assess the correlation between metrics included in
meta.data generating a pairsplot.
Create violin plots resembling QC metrics using
Seurat function VlnPlot().
Visualize QC features over the HE image exploiting
SpatialDimPlot() and
SpatialFeaturePlot().
Remove spots with very few transcript/gene counts.
By applying permissive thresholds, we exclude cells with insufficient
gene expression information, which cannot be used for downstream
analysis.
In particular, consider these two thresholds:
nFeature_threshold: compute the 25th percentile of
nFeature distributionnCount_threshold: compute the 25th percentile of
nCount distributionSelect spots having:
nCount > nCount_thresholdnFeature > nFeature_thresholdQuestions
Inspecting the QCs, how do you evaluate this experiment?
How many spots did you retained after applying filtering
strategy?
# Assess the percentage of mithocondrial (MT) MT genes
# The function PercentageFeatureSet() enables to compute the percentage of counts belonging to a subset of the possible features for each cell.
# We specify the pattern of interest in `pattern` selecting all genes starting with 'MT' in the gene name.
# The calculation is simply the column sum of the matrix present in the counts slot for features belonging to the set divided by the column sum for all features times 100.
mt_percentage = PercentageFeatureSet(sp, pattern = "^MT-")
# Adding the mt_percentage as metadata to the `SeuratObject`
# AddMetaData adds additional data to the object. It can be any piece of information associated with a cell (spot).
sp = AddMetaData(object = sp, metadata = mt_percentage, col.name = "percent.mt")
# Assess the capture efficiency
# the capture efficiency is measured as the ratio nFeature_Visium10x/nCount_Visium10x per spot.
# We add this annotation by creating a new column directly on the meta.data dataframe
sp@meta.data$transcriptome_capture_efficiency = with(sp@meta.data, nFeature_Visium10x/nCount_Visium10x)
# Test the dependencies of the stats;
selected_features = c("nFeature_Visium10x","nCount_Visium10x", "percent.mt"
, "transcriptome_capture_efficiency" )
# `pairs` creates a matrix of scatter plots for visualizing relationships between the selected features
# we specify to return the correlation estimate in the panel upper the diagonal with the parameter `upper.panel`.
pairs(sp@meta.data[,selected_features]
, lower.panel = panel.smooth
, upper.panel = panel.cor)
# `VlnPlot` is a function included in `Seurat` package that creates violin plots to visualize the distribution of features of interest.
# With the`feature` parameter we specifically consider the metrics included in the vector `selected_features.`
# On the x axis we have the currently active identity in our `SeuratObject`, e.g., `orig.ident`.
VlnPlot(sp, layer = 'counts', features = selected_features, pt.size = 0.1, ncol = 2) & theme(axis.title.x = element_blank())
# Let's see the statistics on the tissue slide:
# SpatialDimPlot and SpatialFeaturePlot are two main functions to plot features considering spatial information.
HE_slide = SpatialDimPlot(sp, pt.size.factor = 0.5)
FS_slide = SpatialFeaturePlot(sp, features = selected_features, combine = T, ncol = 4 )
patchwork::wrap_plots(HE_slide, FS_slide, ncol = 1)
# Filtering low-quality spots
nf_th = quantile( sp@meta.data$nFeature_Visium10x)['25%']
nc_th = quantile( sp@meta.data$nCount_Visium10x)['25%']
sp <- subset(sp, subset =
nFeature_Visium10x > nf_th &
nCount_Visium10x > nc_th )
sp
## An object of class Seurat
## 33538 features across 1567 samples within 1 assay
## Active assay: Visium10x (33538 features, 0 variable features)
## 1 layer present: counts
## 1 spatial field of view present: slice
Let’s define clusters of spots that share similar transcriptomic profiles.
Normalize counts using log-normalization method and find the top
2,000 variable features using
FindVariableFeatures().
Perform PCA dimensionality reduction computing first 50 PCs and scale data.
Find the 20-nearest neighbors of each spot using the function
FindNeighbors().
Identify clusters using PCA reduction as reduction
for the nearest-neighbor graph construction. Specifically, use the first
30 dimensions of PCA as input.
Visualize the identified clusters over the HE image using
SpatialDimPlot() function labelling each cluster with the
correspondent number.
Visualize the identified clusters over the HE image using
SpatialDimPlot() function labelling each cluster with the
correspondent number.
Collect in a list the spot names according to the correspondent
cluster using the function CellsByIdentities().
Plot the clusters separately over the HE image specifying in
cells.highlight parameter the list containing groups of
cells per cluster.
Questions
Which parameter in the function FindClusters()
impacts the most on the numbers of detected clusters? You can set this
parameter equal to 3 and see if you get a larger number of
communities.
How many clusters did you find?
How many spots did you get in each cluster?
What algorithm did you choose to optimize modularity? Motivate your choice.
# We perform standard log-normalization of count data
sp <- NormalizeData(sp, assay = "Visium10x", normalization.method = "LogNormalize")
# We compute the top 2,000 varibale feature according to mean expression and standard deviation of each feature
# As selection method we use 'vst' that is commonly used.
# First, fits a line to the relationship of log(variance) and log(mean) using local polynomial regression (loess).
# Then standardizes the feature values using the observed mean and expected variance (given by the fitted line).
# Feature variance is then calculated on the standardized values after clipping to a maximum equal to square root of the number of cells (default)
sp <- FindVariableFeatures(sp, selection.method = "vst", nfeatures = 2000)
# Scale and centers features
sp <- ScaleData(sp)
# Perform PCA dimensionality reduction computing 50 PCs
sp <- RunPCA(sp, assay = "Visium10x", reduction.name = "pca", npcs = 50)
#`FindNeighbors` function is used to compute the nearest neighbours (NN) for clustering, identifying the nearest neighbours of each cell based on the first 30 PCs.
# we use PCA reduction as input for the NN graph construction exploiting the first 30 dimensions of PCA.
sp <- FindNeighbors(sp, assay = "Visium10x", reduction = "pca", dims = 1:30)
#`FindClusters` identifies clusters in the data, maximizing resolution parameter we obtain a larger number of communities
# We employ Louvain algorithm for modularity optimization
sp <- FindClusters(sp, cluster.name = "seurat_cluster", resolution = 3, algorithm = 1)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 1567
## Number of edges: 62624
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.5699
## Number of communities: 19
## Elapsed time: 0 seconds
# The identified clusters are visualized over the spatial image: this helps to see the spatial distribution of different gene expression patterns.
Idents(sp) <- "seurat_clusters"
# retrieves and sort cells by their cluster identities, storing the result in the `cells` variable
# using CellsByIdentities function we generate a list containing as many elements as the clusters. Each element includes the spot names of the correspondent cluster.
cells <- CellsByIdentities(sp, idents = sort((unique(sp@meta.data$seurat_clusters))))
str(cells)
## List of 19
## $ 0 : chr [1:130] "AAAGGCTACGGACCAT-1" "AAAGGCTCTCGCGCCG-1" "AAATTACCTATCGATG-1" "AACAGCTGTGTGGCAA-1" ...
## $ 1 : chr [1:117] "AAATAGGGTGCTATTG-1" "AAATCTAGCCCTGCTA-1" "AACGATAATGCCGTAG-1" "AACGCTGTTGCTGAAA-1" ...
## $ 2 : chr [1:109] "AAATCGTGTACCACAA-1" "AACCGAGCTTGGTCAT-1" "AAGATTGGCGGAACGT-1" "AAGGTATCCTAATATA-1" ...
## $ 3 : chr [1:106] "AACAACTGGTAGTTGC-1" "AACGCGGTCTCCAGCC-1" "AACTGGGTCCCGACGT-1" "AATGATGCGACTCCTG-1" ...
## $ 4 : chr [1:104] "AAACGAGACGGTTGAT-1" "AAATGGTCAATGTGCC-1" "AACCTTTAAATACGGT-1" "AACTCAAGTTAATTGC-1" ...
## $ 5 : chr [1:100] "AAACTGCTGGCTCCAA-1" "AAATACCTATAAGCAT-1" "AAATTGATAGTCCTTT-1" "AACACACGCTCGCCGC-1" ...
## $ 6 : chr [1:99] "AACGGCCATCTCCGGT-1" "AACGTACTGTGGGTAC-1" "AAGGAGCGGTTGGTGC-1" "AAGTCAATTGTCGTCA-1" ...
## $ 7 : chr [1:94] "AAACCGTTCGTCCAGG-1" "AACTAGGCTTGGGTGT-1" "AATAACACTAGAACAA-1" "AATAGAACAGAGTGGC-1" ...
## $ 8 : chr [1:85] "AACAGGAAATCGAATA-1" "AACGATAGAAGGGCCG-1" "AACGCGACCTTGGGCG-1" "AACTGATATTAGGCCT-1" ...
## $ 9 : chr [1:83] "AAATGGCATGTCTTGT-1" "AACGATATGTCAACTG-1" "AACGCATGATCTGGGT-1" "AACTCCAGAGCGTGTT-1" ...
## $ 10: chr [1:78] "AACGTGCGAAAGTCTC-1" "AAGAGCTCTTTATCGG-1" "AAGAGGCATGGATCGC-1" "AATCGCCTCAGCGCCA-1" ...
## $ 11: chr [1:76] "AACCTCGCTTTAGCCC-1" "AAGTTCACTCCAAGCT-1" "ACAAGGATGCTTTAGG-1" "ACGCCGCTAGACGACC-1" ...
## $ 12: chr [1:76] "AACAATACATTGTCGA-1" "AAGCGTCCCTCATCGA-1" "AAGTGAGTCGGGTTTA-1" "AATTAACGGATTTCCA-1" ...
## $ 13: chr [1:67] "AACCTTTACGACGTCT-1" "AATGCACCAAGCAATG-1" "AGCCCTTCTAATCCGA-1" "AGCTATTTAATCCAAC-1" ...
## $ 14: chr [1:55] "AAATTAACGGGTAGCT-1" "AACATATCAACTGGTG-1" "AACGTCAGACTAGTGG-1" "AATATCGAGGGTTCTC-1" ...
## $ 15: chr [1:53] "AAGTGCGTTAGAATCT-1" "AGAAGGTTGCCGAATT-1" "AGGACGACCCATTAGA-1" "AGTGATATGAGTAGTT-1" ...
## $ 16: chr [1:51] "AAACAGTGTTCCTGGG-1" "AACCCGACAACCCGTG-1" "AATACCGGAGGGCTGT-1" "AATCAGGTTTCATTTA-1" ...
## $ 17: chr [1:48] "AAACAGAGCGACTCCT-1" "AAATAACCATACGGGA-1" "AAATTACACGACTCTG-1" "ACGATACATAGAACTA-1" ...
## $ 18: chr [1:36] "ACACAAAGACGGGTGG-1" "ACAGGTGGAGGTGAGG-1" "ACGATCATACATAGAG-1" "AGCATCGTCGATAATT-1" ...
p <- SpatialDimPlot(sp,
cells.highlight = cells,
cols.highlight = c("#FFFF00", "grey50"), facet.highlight = T, combine = T) + NoLegend()
cluster_slide = SpatialDimPlot(sp, label = T, repel = T, label.size = 4)
patchwork::wrap_plots(HE_slide, cluster_slide, ncol=2)
Use the function FindAllMarkers() to get the list of
genes that represent each cluster keeping only positive markers
(FoldChange (FC) > 0).
In the function, choose proper thresholds for parameters in order to
limit testing procedures:
Rank markers per cluster according to the adjusted p-value.
Subset in another variable differentially expressed markers
having a log2(fold-change) ≥ 1 and
adjusted p-value ≤ 0.1.
Visualize espression of top 50 markers per cluster in a heatmap.
Questions
What are the parameters that you set in
FindAllMarkers()? What statistical test did you choose to
identify markers ?
How many positive markers did you identify per cluster? Of those, how many are differentially expressed ?
In Seurat adjusted p-values are calculated using what type of multiple testing correction algorithm?
require(presto) # this package speeds up the computational time
DefaultAssay(sp) <- "Visium10x"
# Find all markers for each cluster:
# specifying `only.pos` = TRUE we obtain only positive markers (having FC > 0 )
# We choose Wilcoxon test as it is fast returning reliable results as well. Moreover, it has been ranked among the best methods for statistical testing in benchmarking studies comparing methods in single cells context (refs: https://www.nature.com/articles/nmeth.4612, https://www.nature.com/articles/s42003-020-01247-y)
# We consider only genes detected in a minimum fraction of 0.01 and that show at least 0.1-fold difference (log-scale) to do not miss weaker signals.
markers <- FindAllMarkers(sp, assay = "Visium10x", only.pos = TRUE
, min.pct = 0.01, logfc.threshold = 0.1, test.use = "wilcox")
# Adds a rank column to the markers datafame, ranking them by adjusted p-value
markers <- ddply(markers, .(cluster), mutate, rank = rank(p_val_adj))
# Filter markers to select the ones with a log2fold change ≥ 1 and adjusted p-value ≤ 0.1
markers %>%
group_by(cluster) %>%
dplyr::filter(avg_log2FC >= 1 & p_val_adj <= 0.1) -> selected_markers
# Select the top 50 genes for each cluster
selected_markers %>% slice_head(n = 50) %>% ungroup() -> top50
# Scales and centers the data for the selected features (e.g top50$gene)
sp_subset <- ScaleData(sp, assay = "Visium10x", features = top50$gene)
# `features` specifies the features (genes) to be included in the heatmap
p <- DoHeatmap(sp_subset, assay = "Visium10x", features = top50$gene, size = 2.5) +
theme(axis.text = element_text(size = 5.5)) + NoLegend()
p
Let’s identify cancerous and not cancerous cells by annotating
markers on Hallmarks of cancer and cancer/healthy drivers retrieved from
Network of Cancer Genes and Healthy Drivers.
To do so, we will exploit:
In this phase, consider all positive markers you identified per cluster. In this way, you will get a comprehensive view of the genes composing each cluster.
To investigate Hallmarks of Cancer features :
Retrieve human Hallmarks gene set from MSigDB
Collections using the msigdbr() function by specifying
the abbreviation category correspondent to Hallmarks collection.
Perform an enrichment analysis with the enricher()
function on all positive markers identified for each cluster.
Employ the following cutoffs: pvalue = 1 and
qvalue = 1. For multiple testing correction method, use the
Benjamini–Hochberg method.
Select statistically significant enrichments (FDR < 0.1).
Generate a plot visualizing enriched hallmark-related pathways on y-axis, -log10(FDR) on the x-axis and color by cluster identifiers.
Now:
Download the manually curated collection of cancer genes and
healthy drivers from Network
of Cancer Genes & Healthy Drivers (NCG) portal. Alternatively,
you can find the already downloaded data in the input
folder.
| What | Path to |
|---|---|
| Drivers of cancer clone | ../input/NCG_cancerdrivers_annotation_supporting_evidence.tsv/ |
| Drivers of not cancer clone | ../input/NCG_healthydrivers_annotation_supporting_evidence.tsv |
Subset the cancer driver genes primarily associated with ovarian
cancer and healthy driver genes associated with the gynecologic organ
system.
In the dataframe containing all positive markers for each
cluster, add two columns to annotate whether each gene is a cancer
driver or a healty driver according to NCG collection.
Sum up the number of drivers of cancer clone and
drivers of not cancer clone expressed in each spot.
Add these annotations to the meta.data table of your
SeuratObject.
Visualize these two newly added features over the HE image of the tissue.
Considering enrichment of clusters in Hallmark pathways, what is the most enriched gene set in each cluster?
HINT: you can explore metrics such as RichFactor or GeneRatio.
By inspecting the enriched hallmarks of cancer, can you assign a
specific label to each cluster amongts the following list?
- normal
- early tumor onset
- late tumor stage
- inflammation
- DNA damage
How many cancer genes and healthy genes we are dealing with?
Looking at the HE image, are clusters expressing drivers of cancer or not cancer clone clearly separated within the tissue?
Based on the expression of cancer drivers, where would you expect to find cancer lesions in the tissue? Take the HE image of the tissue and draw a box where you expected to localize tumor cells in the tissue.
In which Hallmark pathways are the clusters that express cancer driver genes enriched? And in those featured by drivers of not cancer clones?
# Fetching Hallmark gene sets:
h_gene_sets = msigdbr(species = "human", category = "H")
# `enricher` performs enrichment analysis the genes of each cluster
res = vector("list", length(unique(markers$cluster)))
for(i in unique(markers$cluster)){
genes = unique(subset(markers, cluster==i)$gene)
genes = genes[!is.na(genes)]
y = enricher(genes
, TERM2GENE = h_gene_sets[,c('gs_name','gene_symbol')]
, pAdjustMethod = "BH"
, pvalueCutoff=1
, qvalueCutoff=1 )
y = y@result
y$cluster = i
res[[i]] = y
}
# Selecting statistically significant enrichements defined as enrichment FDR < 0.1
enr = bind_rows(res) %>% subset(p.adjust<=0.1)
# Visualization of Enrichment results
ggplot(enr,aes(x = -log10(p.adjust),y = fct_reorder(Description, p.adjust, .desc = T), color=cluster, shape=cluster)) +
geom_point(size=2)+scale_shape_manual(values=0:18)+scale_x_sqrt()+theme_bw()
enr = enr %>% mutate(geneRatio = as.numeric(sapply(strsplit(GeneRatio,'/'),`[`,1)) /
as.numeric(sapply(strsplit(GeneRatio,'/'),`[`,2))
, richFactor = as.numeric(sapply(strsplit(GeneRatio,'/'),`[`,1)) / as.numeric(sub("/\\d+", "", BgRatio)) )
# Loading NCG data:
cgc = subset(read.delim2('../input/NCG_cancerdrivers_annotation_supporting_evidence.tsv')
, primary_site %in% c('ovary'))
healthy = subset(read.delim2('../input/NCG_healthydrivers_annotation_supporting_evidence.tsv'),
organ_system=='Gynecologic')
message("The dimensions of the cancer genes dataframe are: ", paste(dim(cgc), collapse = " x "))
message("The dimensions of the healthy genes dataframe are: ", paste(dim(healthy), collapse = " x "))
#`markers$driver_cancer_clone` and `markers$driver_not_cancer_clone`: Flags markers as either cancer drivers or not.
markers$driver_cancer_clone = markers$gene %in% cgc$symbol
markers$driver_not_cancer_clone = markers$gene %in% healthy$symbol
# `overview`: Summarizes the number of markers and the count of cancer and non-cancer drivers per cluster.
overview = ddply(markers, .(cluster), summarise
, markers = n()
, driver_cancer_clone = sum(driver_cancer_clone)
, driver_not_cancer_clone = sum(driver_not_cancer_clone)
) %>% arrange(as.numeric(as.character(cluster)))
# Adding metadata to `SeuratObject` to visualize cancer driver information
cancer = overview$driver_cancer_clone[match(Idents(sp), overview$cluster)]
names(cancer) <- colnames(sp)
not_cancer = overview$driver_not_cancer_clone[match(Idents(sp), overview$cluster)]
names(not_cancer) <- colnames(sp)
sp <- AddMetaData(object = sp, metadata = cancer, col.name = 'driver_cancer_clone')
sp <- AddMetaData(object = sp, metadata = not_cancer, col.name = 'driver_not_cancer_clone')
# Visualizing how cancer and non-cancer driver genes are distributed
cgc_slide = SpatialFeaturePlot(sp, features = c('driver_cancer_clone', 'driver_not_cancer_clone')
, combine = T, ncol = 2
, keep.scale = 'all' )
cgc_slide
Let’s see whether we can strengthen the results by inferring cell
composition.
Cell type prediction can be done using an unsupervised approach based on
clustermole
package.
Clustermole performs cell type prediction based on a set of marker genes
or on a table of expression values. We will infer cell types from gene
expression matrix.
From SeuratObject retrieve assay data containing normalized
counts and transform it into a matrix.
Explore the clustermole package to find the function
to perform cell type composition inference based on normalized
expression counts.
Among available enrichment methods use ssGSEA, as it is
considered to be one of most accurate methods for gene set enrichment
(GSE) analyses (Lauria A., et al., NAR, 2020) .
Retrieve the full list of human cell type markers in the
clustermole database using clustermole_markers().
Subset the list to include only rows matching the patterns
Human and CellMarker in the
celltype_full column.
From this selection, further filter markers that meet at least
one of the following conditions:
ovary as organ|Pluripotent|Pancrea|Hepat|Sertoli|Oligo|Induced|Hematopoietic|Plasma|Mast|Pluripotentstemcell|Platelet|Megakaryocyte|Embr|Neur|Glia|glia|Purkinje|Pyrami|Germ|germ|Follic|neuro|Adventitial.Subset the cell type enrichments obtained in step 2, keeping only
those cell types selected in the previous step.
For each spot, select the most enriched cell type according to the
score_rank. Now, you can add cell type prediction per spot
in the meta.data table of your
SeuratObject.
In the cell type enrichment dataframe add a column that
associates each spot barcode name with its corresponding cluster number
ID. Calculate the proportion of each cell type within each
cluster.
Generate a barplot plotting on x-axis the different cluster and filled
by the proportion of cell types.
Questions
Considering results in the barplot, which clusters contain an higher fraction of cancer cells?
Calculate cell types per cluster and spatially visualize them on the tissue image. Which cell types compose clusters expressing cancer drivers measured in the previous section?
Is there any concordance between top expressed markers and inferred cell types per cluster?
What are the limitations of clustermole in the context of 10X Visium data?
DefaultAssay(sp) <- "Visium10x"
# Retrieve log-normalized data
log_norm_counts = GetAssayData(object = sp, layer = 'data') %>% as.matrix()
# Perform clustermole enrichment on gene expression matrix
l <- clustermole_enrichment(log_norm_counts, species = "hs", method = 'ssgsea')
## [1] "Calculating ranks..."
## [1] "Calculating absolute values from ranks..."
## [1] "Normalizing..."
# Filter clustermole markers to select those of interest in our case
db_mrks = clustermole_markers('hs') #retrieves marker data specific to human cells
cmh = ddply(subset(db_mrks, grepl('Human', celltype_full) & grepl('CellMarker',celltype_full)), .(db), summarise
, celltype_full=unique(celltype_full), db = unique(db))
cmh$n_info = sapply(strsplit(cmh$celltype_full, "\\|"), length)
cmh$celltype = sapply(strsplit(cmh$celltype_full,"\\|"), '[[',1)
cmh$organ = sapply(strsplit(cmh$celltype_full,"\\|"), '[[',2)
# Select marker associated to ovary as organ OR not containing those reported patterns
selected_cell_types = subset(cmh,
(n_info == 3 &
!grepl("\\(|Pluripotent|Pancrea|Hepat|Sertoli|Oligo|Induced|Hematopoietic|Plasma|Mast|Pluripotentstemcell|Platelet|Megakaryocyte|Embr|Neur|Glia|glia|Purkinje|Pyrami|Germ|germ|Follic|neuro|Adventitial", celltype_full))
| organ == " Ovary ")$celltype_full
rm(cmh, db_mrks)
# Subsets cell type enrichment keeping only cell types of interest
# We will select enrichments based on Cell Marker DB as it is a manually-curated resource including a fine-grained collection of experimentally supported markers of various cell types in different tissues of human and mouse.
enr_res = subset(l, celltype_full %in% selected_cell_types) #includes only the cell types
enr_res$celltype_full = gsub(" | Human | CellMarker", "", enr_res$celltype_full)
enr_res$celltype_full = gsub( "\\||", "", enr_res$celltype_full)
# Summarizing enrichment results and identifies the most enriched cell type per cluster
enr_res = ddply(enr_res,.(cluster), summarise, celltype_full = celltype_full[which(score_rank == min(score_rank))])
# Create a vector with top enriched cell types per spot
celltype = enr_res$celltype_full[match(Cells(sp), enr_res$cluster)]
# Name vector elements
names(celltype) <- colnames(sp)
#Adds the predicted cell types as metadata to the `SeuratObject`
sp <- AddMetaData(object = sp, metadata = celltype, col.name = 'celltype_full')
sp@meta.data$celltype_full[which(is.na(sp@meta.data$celltype_full))]= "Other_cell_types"
# Create a list with clusters ID and correspondent spot barcode names
spots = CellsByIdentities(sp, idents = sort((unique(sp@meta.data$seurat_clusters))))
# For each list element, we create a data.frame containing all spot barcode names and the ID of the cluster
spots = mapply(function(x,y){data.frame(cluster=x, cluster_slide = y)}, x = spots, y = names(spots), SIMPLIFY = F)
# Merge by row all list elements
spots = bind_rows(spots)
# Merges and summarizes the enrichment results with cell spots, calculating the percentage of spots per cluster
# Add the correspondent cluster ID to eache cell type
enr_res = left_join(enr_res, spots, by = "cluster")
# Compute the number of spots per cluster enriched for each inferred cell type
enr_res = ddply(enr_res, .(cluster_slide, celltype_full), summarise, n_spots = n() )
# Compute the number of spots per cluster
enr_res = ddply(enr_res, .(cluster_slide), mutate, total_spots = sum(n_spots))
# Compute the percentage of each cell type found per cluster
enr_res$perc_spot_per_cluster= enr_res$n_spot/enr_res$total_spot
# Visualizing cell type composition
enr_res$cluster_slide = factor(enr_res$cluster_slide, levels = sort(unique((enr_res$cluster_slide))))
celltype_barplot = ggplot(enr_res, aes(cluster_slide, perc_spot_per_cluster, fill=celltype_full))+
geom_bar(stat='identity', position = position_stack(), color='white')+
theme_bw() + ylab('Spot percentage per cluster')+
ggsci::scale_fill_igv()
celltype_barplot
cluster_slide
Detect LR interactions of each cell type using
LIANA.
Specifically, select natmi, cell_italk, call_cellchat and sca as methods
and OmniPath as
resource.
Ensure that a minimum of 5 cells per cell type is
considered for the analysis.
Include these additional columns in your results:
ligand.expr, receptor.expr,
ligand.pval, receptor.pval,
ligand.FDR, receptor.FDR.
Combine the results and specify the ranking of each interaction
according the aggregate_rank score.
Explore visualization function provided by LIANA. Highlight LR interactions involving cancer cells.
Assess ligand-receptors interactions among clusters. Do you observe concordant results in terms of LR pairs characterizing tumoral clusters?
Questions
Which are the top-5 ranked LR interactions per cell type?
Which are LR interactions involving cancer cells as target? Is there a most recurrent interactor?
sp <- SetIdent(sp, value = "seurat_clusters")
# Detect ligand-receptor interactions of *each cell type * using LIANA function liana_wrap().
# Specifically, we use `natmi`, `cell_italk`, `call_cellchat` and `sca` as methods and `OmniPath` as resource.
# Require a minimum of 5 cells per cell type to be considered for analysis.
# Specify to add these following supplementary columns: `ligand.expr`, `receptor.expr`, `ligand.pval`, `receptor.pval`, `ligand.FDR`, `receptor.FDR`.\=
liana = liana_wrap(sp,
method = c( "natmi"
, "call_italk"
, "call_cellchat"
, "sca"
),
resource = c("OmniPath"),
idents_col = 'celltype_full',
min_cells = 5,
return_all = FALSE,
supp_columns = c("ligand.expr", "receptor.expr", "ligand.pval", "receptor.pval", "ligand.FDR",
"receptor.FDR"),
verbose = TRUE,
assay = "Visium10x",
.simplify = TRUE,
cell.adj = NULL
)
## [1] "Create a CellChat object from a data matrix"
## Set cell identities for the new CellChat object
## The cell groups used for CellChat analysis are EndothelialcellOvary, Epithelialcell(OvarianCancer)Ovary, Fibroblast, M1macrophage, M2macrophage
## Issue identified!! Please check the official Gene Symbol of the following genes:
## 2
## Issue identified!! Please check the official Gene Symbol of the following genes:
## DEFB4B CGB8 DEFB106B CCL4L1
## The number of highly variable ligand-receptor pairs used for signaling inference is 1444
## triMean is used for calculating the average gene expression per cell group.
## [1] ">>> Run CellChat on sc/snRNA-seq data <<< [2024-09-01 15:01:36.071059]"
## [1] ">>> CellChat inference is done. Parameter values are stored in `object@options$parameter` <<< [2024-09-01 15:03:59.257296]"
# Liana returns a list of results, each element of which corresponds to a method
# By liana_aggregate() we aggregate scores
liana.summary = liana %>% dplyr::glimpse()
## List of 4
## $ natmi : tibble [28,462 × 18] (S3: tbl_df/tbl/data.frame)
## ..$ source : chr [1:28462] "Epithelialcell(OvarianCancer)Ovary" "Epithelialcell(OvarianCancer)Ovary" "Epithelialcell(OvarianCancer)Ovary" "Epithelialcell(OvarianCancer)Ovary" ...
## .. ..- attr(*, "levels")= chr [1:5] "EndothelialcellOvary" "Epithelialcell(OvarianCancer)Ovary" "Fibroblast" "M1macrophage" ...
## ..$ target : chr [1:28462] "Epithelialcell(OvarianCancer)Ovary" "Epithelialcell(OvarianCancer)Ovary" "Epithelialcell(OvarianCancer)Ovary" "Epithelialcell(OvarianCancer)Ovary" ...
## .. ..- attr(*, "levels")= chr [1:5] "EndothelialcellOvary" "Epithelialcell(OvarianCancer)Ovary" "Fibroblast" "M1macrophage" ...
## ..$ ligand.complex : chr [1:28462] "PRNP" "PODXL" "ADGRB1" "ADAM10" ...
## ..$ ligand : chr [1:28462] "PRNP" "PODXL" "ADGRB1" "ADAM10" ...
## ..$ receptor.complex: chr [1:28462] "TNFRSF25" "SELL" "RTN4R" "TSPAN15" ...
## ..$ receptor : chr [1:28462] "TNFRSF25" "SELL" "RTN4R" "TSPAN15" ...
## ..$ receptor.prop : num [1:28462] 0.307 0.165 0.233 0.369 0.199 ...
## ..$ ligand.prop : num [1:28462] 0.756 0.665 0.125 0.642 0.642 ...
## ..$ ligand.expr : num [1:28462] 0.7242 0.5543 0.0639 0.5357 0.5357 ...
## ..$ receptor.expr : num [1:28462] 0.2027 0.0857 0.1412 0.22 0.0961 ...
## ..$ ligand.sum : num [1:28462] 4.805 1.159 0.101 2.99 2.99 ...
## ..$ receptor.sum : num [1:28462] 0.623 0.315 1.135 0.545 0.225 ...
## ..$ ligand.pval : num [1:28462] 0.70731 0.00896 0.2919 0.76431 0.76431 ...
## ..$ receptor.pval : num [1:28462] 0.608 0.891 0.332 0.112 0.167 ...
## ..$ ligand.FDR : num [1:28462] 1 0.247 1 1 1 ...
## ..$ receptor.FDR : num [1:28462] 1 1 1 0.88 1 ...
## ..$ prod_weight : num [1:28462] 0.14681 0.04748 0.00902 0.11787 0.05146 ...
## ..$ edge_specificity: num [1:28462] 0.049 0.13 0.0787 0.0723 0.0764 ...
## $ call_italk : tibble [13,185 × 9] (S3: tbl_df/tbl/data.frame)
## ..$ source : Factor w/ 5 levels "EndothelialcellOvary",..: 2 2 2 2 2 2 2 2 2 2 ...
## ..$ ligand : chr [1:13185] "PRNP" "BMPR1B" "TNFRSF11B" "TNFRSF11B" ...
## ..$ target : Factor w/ 5 levels "EndothelialcellOvary",..: 2 2 2 2 2 2 2 2 2 2 ...
## ..$ receptor : chr [1:13185] "TNFRSF25" "ENG" "NRP1" "ACE2" ...
## ..$ logFC_from: num [1:13185] -0.664 1.641 1.769 1.769 1.769 ...
## ..$ logFC_to : num [1:13185] 1.01 -0.81 -1 3.3 1.51 ...
## ..$ qval_from : num [1:13185] 8.89e-05 1.00 1.05e-01 1.05e-01 1.05e-01 ...
## ..$ qval_to : num [1:13185] 8.58e-07 1.31e-05 1.00 1.22e-15 1.00 ...
## ..$ logfc_comb: num [1:13185] 0.184 0.184 0.184 0.184 0.184 ...
## $ call_cellchat: tibble [11,595 × 6] (S3: tbl_df/tbl/data.frame)
## ..$ source : Factor w/ 5 levels "EndothelialcellOvary",..: 1 2 3 4 5 1 2 3 4 5 ...
## ..$ target : Factor w/ 5 levels "EndothelialcellOvary",..: 1 1 1 1 1 2 2 2 2 2 ...
## ..$ ligand : chr [1:11595] "PRNP" "PRNP" "PRNP" "PRNP" ...
## ..$ receptor: chr [1:11595] "TNFRSF25" "TNFRSF25" "TNFRSF25" "TNFRSF25" ...
## ..$ prob : num [1:11595] 0.0041 0.00501 0.00604 0.00747 0.00841 ...
## ..$ pval : num [1:11595] 0 0 0 0 0 0 0 0 0 0 ...
## $ sca : tibble [28,462 × 16] (S3: tbl_df/tbl/data.frame)
## ..$ source : chr [1:28462] "Epithelialcell(OvarianCancer)Ovary" "Epithelialcell(OvarianCancer)Ovary" "Epithelialcell(OvarianCancer)Ovary" "Epithelialcell(OvarianCancer)Ovary" ...
## .. ..- attr(*, "levels")= chr [1:5] "EndothelialcellOvary" "Epithelialcell(OvarianCancer)Ovary" "Fibroblast" "M1macrophage" ...
## ..$ target : chr [1:28462] "Epithelialcell(OvarianCancer)Ovary" "Epithelialcell(OvarianCancer)Ovary" "Epithelialcell(OvarianCancer)Ovary" "Epithelialcell(OvarianCancer)Ovary" ...
## .. ..- attr(*, "levels")= chr [1:5] "EndothelialcellOvary" "Epithelialcell(OvarianCancer)Ovary" "Fibroblast" "M1macrophage" ...
## ..$ ligand.complex : chr [1:28462] "PRNP" "PODXL" "ADGRB1" "ADAM10" ...
## ..$ ligand : chr [1:28462] "PRNP" "PODXL" "ADGRB1" "ADAM10" ...
## ..$ receptor.complex: chr [1:28462] "TNFRSF25" "SELL" "RTN4R" "TSPAN15" ...
## ..$ receptor : chr [1:28462] "TNFRSF25" "SELL" "RTN4R" "TSPAN15" ...
## ..$ receptor.prop : num [1:28462] 0.307 0.165 0.233 0.369 0.199 ...
## ..$ ligand.prop : num [1:28462] 0.756 0.665 0.125 0.642 0.642 ...
## ..$ ligand.expr : num [1:28462] 0.7242 0.5543 0.0639 0.5357 0.5357 ...
## ..$ receptor.expr : num [1:28462] 0.2027 0.0857 0.1412 0.22 0.0961 ...
## ..$ global_mean : num [1:28462] 0.158 0.158 0.158 0.158 0.158 ...
## ..$ ligand.pval : num [1:28462] 0.70731 0.00896 0.2919 0.76431 0.76431 ...
## ..$ receptor.pval : num [1:28462] 0.608 0.891 0.332 0.112 0.167 ...
## ..$ ligand.FDR : num [1:28462] 1 0.247 1 1 1 ...
## ..$ receptor.FDR : num [1:28462] 1 1 1 0.88 1 ...
## ..$ LRscore : num [1:28462] 0.708 0.58 0.376 0.685 0.59 ...
liana.summary <- liana.summary %>% liana_aggregate()
# Add rank to each LR pair interaction
liana.summary=ddply(liana.summary, .(source,target), mutate, ranking = rank(aggregate_rank))
# Select the top 5 scoring interactions
top.ranked = subset(liana.summary, ranking <= 5)
# Select LR interactions from selected cell types to tumoral cells represenetd by epithelial cells
lr = subset(top.ranked, source %in% c("Macrophage", "M1macrophage", "M2macrophage","ExhaustedTcell") & target=="Epithelialcell(OvarianCancer)Ovary")
DT::datatable(top.ranked, rownames = F,style = "auto", class = 'cell-border stripe', options = list(scrollX = T, fixedColumns = list(leftColumns =1 )))
DT::datatable(lr, rownames = F,style = "auto", class = 'cell-border stripe', options = list(scrollX = T, fixedColumns = list(leftColumns =1 )))
#heatmap to visualize the frequency of ligand-receptor interactions among cell-types
heat_freq(liana.summary)
# Plotting frequency ChordDiagram related epithelial cells involeved LR
chord_freq(liana.summary,
source_groups = c("Epithelialcell(OvarianCancer)Ovary"))
sp <- SetIdent(sp, value = "seurat_clusters")
# CellChat does not work with cluster id, so we convert cluster number id using a 1-based numeration
sp$numbered_clusters = as.numeric(sp$seurat_clusters)
# Detect ligand-receptor interactions of clusters using LIANA function liana_wrap().
# Specifically, we use `natmi`, `cell_italk`, `call_cellchat` and `sca` as methods and `OmniPath` as resource.
# Require a minimum of 5 cells per cluster to be considered for analysis.
# Specify to add these following supplementary columns: `ligand.expr`, `receptor.expr`, `ligand.pval`, `receptor.pval`, `ligand.FDR`, `receptor.FDR`.\=
liana_clusters = liana_wrap(sp,
method = c( "natmi"
, "call_italk"
, "call_cellchat"
, "sca"
),
resource = c("OmniPath"),
idents_col = 'numbered_clusters',
min_cells = 5,
return_all = FALSE,
supp_columns = c("ligand.expr", "receptor.expr", "ligand.pval", "receptor.pval", "ligand.FDR",
"receptor.FDR"),
verbose = TRUE,
assay = "Visium10x",
.simplify = TRUE,
cell.adj = NULL
)
## [1] "Create a CellChat object from a data matrix"
## Set cell identities for the new CellChat object
## The cell groups used for CellChat analysis are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
## Issue identified!! Please check the official Gene Symbol of the following genes:
## 2
## Issue identified!! Please check the official Gene Symbol of the following genes:
## DEFB4B CGB8 DEFB106B CCL4L1
## The number of highly variable ligand-receptor pairs used for signaling inference is 2117
## triMean is used for calculating the average gene expression per cell group.
## [1] ">>> Run CellChat on sc/snRNA-seq data <<< [2024-09-01 15:05:46.081358]"
## [1] ">>> CellChat inference is done. Parameter values are stored in `object@options$parameter` <<< [2024-09-01 15:08:42.551527]"
# Liana returns a list of results, each element of which corresponds to a method
# By liana_aggregate() we aggregate scores
liana_clusters.summary = liana_clusters %>% dplyr::glimpse()
## List of 4
## $ natmi : tibble [362,098 × 18] (S3: tbl_df/tbl/data.frame)
## ..$ source : chr [1:362098] "18" "18" "18" "18" ...
## .. ..- attr(*, "levels")= chr [1:19] "1" "2" "3" "4" ...
## ..$ target : chr [1:362098] "18" "18" "18" "18" ...
## .. ..- attr(*, "levels")= chr [1:19] "1" "2" "3" "4" ...
## ..$ ligand.complex : chr [1:362098] "PRNP" "DLL1" "DLL1" "DLL1" ...
## ..$ ligand : chr [1:362098] "PRNP" "DLL1" "DLL1" "DLL1" ...
## ..$ receptor.complex: chr [1:362098] "TNFRSF25" "NOTCH1" "NOTCH2" "NOTCH4" ...
## ..$ receptor : chr [1:362098] "TNFRSF25" "NOTCH1" "NOTCH2" "NOTCH4" ...
## ..$ receptor.prop : num [1:362098] 0.5 0.625 0.771 0.125 0.708 ...
## ..$ ligand.prop : num [1:362098] 0.958 0.188 0.188 0.188 0.188 ...
## ..$ ligand.expr : num [1:362098] 0.8412 0.0646 0.0646 0.0646 0.0646 ...
## ..$ receptor.expr : num [1:362098] 0.2729 0.3163 0.5551 0.0749 0.4933 ...
## ..$ ligand.sum : num [1:362098] 17.642 0.803 0.803 0.803 0.803 ...
## ..$ receptor.sum : num [1:362098] 2.148 3.735 11.971 0.784 6.02 ...
## ..$ ligand.pval : num [1:362098] 0.934 0.383 0.383 0.383 0.383 ...
## ..$ receptor.pval : num [1:362098] 0.44 0.426 0.977 0.914 0.877 ...
## ..$ ligand.FDR : num [1:362098] 1 1 1 1 1 ...
## ..$ receptor.FDR : num [1:362098] 1 1 1 1 1 ...
## ..$ prod_weight : num [1:362098] 0.22959 0.02044 0.03589 0.00484 0.03189 ...
## ..$ edge_specificity: num [1:362098] 0.00606 0.00682 0.00373 0.00768 0.0066 ...
## $ call_italk : tibble [185,400 × 9] (S3: tbl_df/tbl/data.frame)
## ..$ source : Factor w/ 19 levels "1","2","3","4",..: 18 18 18 18 18 18 18 18 18 18 ...
## ..$ ligand : chr [1:185400] "BTN3A1" "DLL1" "DLL1" "PODXL" ...
## ..$ target : Factor w/ 19 levels "1","2","3","4",..: 18 18 18 18 18 18 18 18 18 18 ...
## ..$ receptor : chr [1:185400] "LRRC4C" "NOTCH1" "NOTCH3" "SELL" ...
## ..$ logFC_from: num [1:185400] 1.05 0.582 0.582 0.935 2.228 ...
## ..$ logFC_to : num [1:185400] 2.187 0.482 0.435 1.457 0.98 ...
## ..$ qval_from : num [1:185400] 1.62e-04 1.00 1.00 7.84e-08 1.00 ...
## ..$ qval_to : num [1:185400] 1.00 4.55e-01 9.98e-01 2.88e-07 1.00 ...
## ..$ logfc_comb: num [1:185400] 0.0917 0.0917 0.0917 0.0917 0.0917 ...
## $ call_cellchat: tibble [172,838 × 6] (S3: tbl_df/tbl/data.frame)
## ..$ source : int [1:172838] 11 11 11 11 11 11 11 11 11 11 ...
## ..$ target : int [1:172838] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ ligand : chr [1:172838] "CD300LB" "CD300LB" "CD300LB" "CD300LB" ...
## ..$ receptor: chr [1:172838] "TYROBP" "TYROBP" "TYROBP" "TYROBP" ...
## ..$ prob : num [1:172838] 0.00579 0.0042 0.00564 0.00602 0.00569 ...
## ..$ pval : num [1:172838] 0 0 0 0 0 0 0 0 0 0 ...
## $ sca : tibble [362,098 × 16] (S3: tbl_df/tbl/data.frame)
## ..$ source : chr [1:362098] "18" "18" "18" "18" ...
## .. ..- attr(*, "levels")= chr [1:19] "1" "2" "3" "4" ...
## ..$ target : chr [1:362098] "18" "18" "18" "18" ...
## .. ..- attr(*, "levels")= chr [1:19] "1" "2" "3" "4" ...
## ..$ ligand.complex : chr [1:362098] "PRNP" "DLL1" "DLL1" "DLL1" ...
## ..$ ligand : chr [1:362098] "PRNP" "DLL1" "DLL1" "DLL1" ...
## ..$ receptor.complex: chr [1:362098] "TNFRSF25" "NOTCH1" "NOTCH2" "NOTCH4" ...
## ..$ receptor : chr [1:362098] "TNFRSF25" "NOTCH1" "NOTCH2" "NOTCH4" ...
## ..$ receptor.prop : num [1:362098] 0.5 0.625 0.771 0.125 0.708 ...
## ..$ ligand.prop : num [1:362098] 0.958 0.188 0.188 0.188 0.188 ...
## ..$ ligand.expr : num [1:362098] 0.8412 0.0646 0.0646 0.0646 0.0646 ...
## ..$ receptor.expr : num [1:362098] 0.2729 0.3163 0.5551 0.0749 0.4933 ...
## ..$ global_mean : num [1:362098] 0.158 0.158 0.158 0.158 0.158 ...
## ..$ ligand.pval : num [1:362098] 0.934 0.383 0.383 0.383 0.383 ...
## ..$ receptor.pval : num [1:362098] 0.44 0.426 0.977 0.914 0.877 ...
## ..$ ligand.FDR : num [1:362098] 1 1 1 1 1 ...
## ..$ receptor.FDR : num [1:362098] 1 1 1 1 1 ...
## ..$ LRscore : num [1:362098] 0.752 0.476 0.546 0.306 0.531 ...
liana_clusters.summary <- liana_clusters.summary %>% liana_aggregate()
# Add rank to each LR pair interaction
liana_clusters.summary = ddply(liana_clusters.summary, .(source,target), mutate, ranking = rank(aggregate_rank))
# Select the top 5 scoring interactions
top.ranked_clusters = subset(liana_clusters.summary, ranking <= 5)
# Select LR interactions to clusters compoed of tumoral cells
# Bear in mind we add +1 to cluster number ids
lr_clusters = subset(top.ranked_clusters, target %in% c(14, 18, 12))
DT::datatable(top.ranked_clusters, rownames = F,style = "auto", class = 'cell-border stripe', options = list(scrollX = T, fixedColumns = list(leftColumns =1 )))
DT::datatable(lr_clusters, rownames = F,style = "auto", class = 'cell-border stripe', options = list(scrollX = T, fixedColumns = list(leftColumns =1 )))